5 Steps to Build a Question Answering PDF Chatbot: LangChain + OpenAI + Panel + HuggingFace.

Поделиться
HTML-код
  • Опубликовано: 11 дек 2024

Комментарии • 114

  • @alexkim1919
    @alexkim1919 Год назад +3

    I didn't know about Panel... thanks. for sharing that.

  • @geraldofrancisco5206
    @geraldofrancisco5206 Год назад

    Your video is the very best of the kind, clear, complete, consise and assumes that the viewer knows nothing (I surely don't). thank you VERY much, for this.

    • @SophiaYangDS
      @SophiaYangDS  Год назад +1

      Thanks so much for your support 🙏😊

  • @ubaisalih2987
    @ubaisalih2987 Год назад

    best video on youtube explaining PDF Q/A along with OpenAI and creating a final App, thank you very much

  • @alokrajsidhaarth7130
    @alokrajsidhaarth7130 Год назад +1

    great video!

  • @MachineMinds_AI
    @MachineMinds_AI Год назад +1

    Thanks for sharing Sophia, added to our playlist.

  • @paulmuscat2770
    @paulmuscat2770 Год назад

    Thanks!

    • @SophiaYangDS
      @SophiaYangDS  Год назад

      Thanks so much 🙏❤ my first super thanks! Really appreciate it!

  • @rafacanseco
    @rafacanseco Год назад +4

    sophia, this was a perfect video, everything is super clear. i subscribe asap to your channel and right now I'm reading the python panel app on your medium. Your content is amazing. Congratulations.

  • @CinematicAdventureOne
    @CinematicAdventureOne Год назад +1

    Wow, thanks Sophia! This is exactly something I was looking for, nice tutorial and explanation.

    • @SophiaYangDS
      @SophiaYangDS  Год назад +1

      Thanks so much 🙏 glad it helped 😊

  • @LucesLab
    @LucesLab Год назад

    Great video Sophia. Thanks for sharing !!

  • @jornjat
    @jornjat Год назад

    Thanks a lot for sharing your wisdom! Hopefully will be Sophia-enabled to make use of LangChain in a project...

  • @1littlecoder
    @1littlecoder Год назад +1

    Great Video . Definitely gives me FOMO of not exploring Panel much :)

    • @SophiaYangDS
      @SophiaYangDS  Год назад

      You could try it today 😊 Panel is the best

  • @danasugu1767
    @danasugu1767 Год назад +2

    Thanks, Sophia. Can you make a video on using Karpathy' NanoGPT instead of OpenAi? A question answer pdf using LangChain + NanoGPT

  • @TheBontenbal
    @TheBontenbal Год назад

    Great video, Sophia!

  • @RedCloudServices
    @RedCloudServices Год назад +2

    Sophia can you make a video and show how to bind the response with a Panel visualization? and store your X/Y values as variables based on user data or inputs?

  • @yohanzeta
    @yohanzeta Год назад +5

    Are there any limitations to this as far as number of pages

  • @anujcb
    @anujcb Год назад

    awesome tutorial by the way

  • @pablonavarro6263
    @pablonavarro6263 Год назад

    super cool vided you're very helpful,

  • @marcinkrol633
    @marcinkrol633 Год назад

    Great video editing, thanks for q tutorial

  • @adilmajeed8439
    @adilmajeed8439 Год назад

    Thanks for sharing the working model

  •  Год назад

    Perfeito! Muito obrigado, 💯Sophia Yang!

  • @andfanilo
    @andfanilo Год назад +2

    Very enjoyable video :D love the Panel + QA app!
    also I don't why but with some very soft lofi jazz lounge coffee music in the background, I can definitely see this as a coding livestream 😁

    • @SophiaYangDS
      @SophiaYangDS  Год назад +1

      haha thanks! I was too tired to add any music. Great suggestion though.

    • @andfanilo
      @andfanilo Год назад

      @@SophiaYangDS I think it's fine to not put music here, some people don't like music for purely educational content 🙂 for a chill reading/coding livestream though, that'd be dope
      Rest well in between all the work committments. 화이팅 ! (I've been watching too much kdrama recently XD)

    • @SophiaYangDS
      @SophiaYangDS  Год назад +1

      @@andfanilo 화이팅!

  • @happyday.mjohnson
    @happyday.mjohnson Год назад +1

    I tried this on my health plan. I asked what the deductible amount was. I got all sorts of answers but not the exact amount until I told it to look in the Benefits Details section to find it. Then it answered correctly. I was hoping I could finally figure out how much every "touch" to my health/health plan would cost and what options are best...thoughts on tuning would be awesome. I'll keep plodding along. Thank you for this video.

  • @jaewoochung4954
    @jaewoochung4954 Год назад +1

    Hi Sophia,
    Thanks for the great video. You mentioned that you don't have to use OpenAI as the LLM in the RetrievalQA step. I've been messing around with HuggingFace and none of the models I've used have been able to even spit out a coherent or correct answer. Do you have any recommendations for a HuggingFace model that can at least someone replicate OpenAI's performance with this task? I'm definitely missing something here. Thanks so much.
    P.S. Part of the problem I've had is that I'm not sure which of OpenAI's models we're calling since the code is just llm=OpenAI(), so I don't know what types of models to look for. I've tried text generation and text2text generation, but they don't work well, and question answering models don't give a the human-like response I am looking for.

    • @mustafadabah7377
      @mustafadabah7377 Год назад

      hi !, i have the same issue, did you found any solution or a good model in hugging face can help instead of open ai ?

  • @arthurperini
    @arthurperini Год назад

    Great video thank you. Is really necessary use langchain to do that? I was building a chatbot, but gave up using langchain because using openAi functions example works and spent so many tokens without use the long prompt with the langchain

  • @rudolfbumm8126
    @rudolfbumm8126 Год назад

    Great content. Can you comment on how to programmatically predict or estimate the charge for each question asked?

  • @fuzetea7938
    @fuzetea7938 Год назад

    🎯 Key Takeaways for quick navigation:
    Made with HARPA AI

  • @atombarako
    @atombarako Год назад +1

    Thanks for the great video tutorial. I ran into 2 problems:
    1. Python 3.11 will not install chromadb. Downgraded to 3.10 works.
    2. file_input.save("/.cache/temp.pdf") does not work. ChatGPT helped me to solve the problem:
    current_dir = os.getcwd()
    cache_dir = os.path.join(current_dir, '.cache')
    if not os.path.exists(cache_dir):
    os.makedirs(cache_dir)
    pdf_file = os.path.join(cache_dir, 'temp.pdf')
    Then change the line to: file_input.save(pdf_file)

    • @SophiaYangDS
      @SophiaYangDS  Год назад

      Thanks for pointing it out! Yeah I created the .cache directory and gave permissions in the docker file when host on Hugging Face Space. To run it locally, you can change ".cache/temp.pdf" to "temp.pdf".

    • @oguzhanzobar1094
      @oguzhanzobar1094 Год назад

      @@SophiaYangDS Yes! I was encountering the same issue. I will try both of these.

  • @emmanuelkolawole6720
    @emmanuelkolawole6720 Год назад

    Also, where did you specify the chatbot you want to use for the embedding? E.g GPT 3 turbo, GPT 4 , etc

    • @SophiaYangDS
      @SophiaYangDS  Год назад

      You can define "llm=xxx" and "embedding=xxx" in the qa function

  • @teesimshu2971
    @teesimshu2971 Год назад

    If we using fine-tuned model, so we still need to upload PDF for each time we query the PDF file with using same GPT API account?

  • @VenkatesanVenkat-fd4hg
    @VenkatesanVenkat-fd4hg Год назад

    Thanks for your valuable video. How to do boolean qa response any suggestions...

  • @ronakdinesh
    @ronakdinesh Год назад

    Great thank you for the video. Is there a way that I can load multiple pdfs?

    • @SophiaYangDS
      @SophiaYangDS  Год назад

      Yes, you can write list all files in a list and use a for loop to load multiple PDFs

  • @parekhnikunj
    @parekhnikunj 8 месяцев назад

    How do you compare streamlit vs panel?

  • @davidwu3247
    @davidwu3247 Год назад

    thank you for the video!
    Do you think it'd be possible to edit the PDF based on user responses (ie input new data) and then output a new PDF file?

    • @SophiaYangDS
      @SophiaYangDS  Год назад

      Yes, you just need to define a function to change the content and save to a new file.

  • @JCSantiago
    @JCSantiago Год назад

    Also for some reason when I upload the pdf it won't load the content in the box. I don't see any errors. How can I fix this?

  • @ScottTaylor-ir6kv
    @ScottTaylor-ir6kv Год назад

    So, for a Solution such as this, how would one account for the fact that the PDF(s) could contain Personal Information such as may HIPPA (health informaion) or maybe financial information, so from a Security perspective - how do handle or account for that..? I assume that the contents are uploaded to Cloud so it would be exposed and at risk...yes?

    • @SophiaYangDS
      @SophiaYangDS  Год назад

      Yeah it is a concern for sure. OpenAI sees all the data, that's why some people prefer to use a local model. Speaking of the cloud, I think the government uses cloud also, so private info is likely already on the cloud 😅

  • @denniskampien987
    @denniskampien987 Год назад

    Can it hold contextual memory for across all the document or just for the text chunks it receives after semantic search?

    • @SophiaYangDS
      @SophiaYangDS  Год назад

      There are multiple ways to do question answering. Check out my previous video: ruclips.net/video/DXmiJKrQIvg/видео.html. In this case, the language model only sees relevant text. You can pass in all the text to the language model as well. It will just cost a lot of money.

  • @pierredsa6809
    @pierredsa6809 Год назад

    Great video thanks !
    I was wondering, would it be possible to make an example of a csv_agent with memory?
    I tried with
    agent = create_csv_agent(OpenAI(temperature=0), 'toto.csv' , pandas_kwargs={'sep': ";"}, verbose=True)
    AgentExecutor.from_agent_and_tools(agent, tools, verbose=True, memory=memory)
    but the constructor fails

  • @ananayaggarwal7909
    @ananayaggarwal7909 Год назад

    After 12.06 what did you do in black screen

    • @SophiaYangDS
      @SophiaYangDS  Год назад

      I opened the app address (localhost:5006/LangChain_QA_Panel_App) in the browser

  • @sandeepsaha
    @sandeepsaha Год назад

    How do you change the model? - say I want to use GPT-4.

    • @SophiaYangDS
      @SophiaYangDS  Год назад

      You can define llm=xxx. Check out my previous video where I went though using a few different models. I don't have access to GPT4, so haven't tried GPT4 yet ruclips.net/video/kmbS6FDQh7c/видео.html

  • @anujcb
    @anujcb Год назад

    if i have to load the pdf files from a google drive, how can we do that?

  • @8eck
    @8eck Год назад

    How long it takes to generate an answer?

  • @jcrsantiago
    @jcrsantiago Год назад +1

    How can I load an entire folder instead of a single pdf?

    • @SophiaYangDS
      @SophiaYangDS  Год назад +1

      Yes input_documents can be a list. You can write a for loop to loop though all the files in a folder

    • @MusaTalhaUnal
      @MusaTalhaUnal Год назад +1

      @@SophiaYangDS Can I load multiple pdfs from panel upload in localhost directly? Or should I create a directory first, and list pdfs in the list? Can you write an example code about it? Thanks for everything. It is really useful tool.

  • @massibob2004
    @massibob2004 Год назад

    Hello
    I don't understand what is the limit of number of files as inputs ?

  • @jameslin7457
    @jameslin7457 Год назад

    how can I use gpt 3.5 specifically as a language model? the one you shown is just openAI, which model does it use when it’s not specified ?

    • @SophiaYangDS
      @SophiaYangDS  Год назад

      Yes, you can specify llm = ChatOpenAI(model_name='gpt-3.5-turbo'). Check out the first part of my LangChain intro video on how to use LangChain with many different model providers: ruclips.net/video/kmbS6FDQh7c/видео.html

    • @boriskozel3094
      @boriskozel3094 Год назад

      @@SophiaYangDS may be OpenAI, not ChatOpenAI?

  • @anujcb
    @anujcb Год назад

    I tried to host the same code in hugging face, but getting the below error.
    failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "panel": executable file not found in $PATH: unknown

    • @SophiaYangDS
      @SophiaYangDS  Год назад

      I'm not sure. Did you duplicate the space?

    • @anujcb
      @anujcb Год назад

      @@SophiaYangDS , never mind, that was a silly spelling mistake. now its is finding the libs and building it. Thank you though!

  • @chrischen7451
    @chrischen7451 Год назад

    Great video, which country are you at right now?

  • @fixelheimer3726
    @fixelheimer3726 Год назад

    Is there a way to run this kind of method with Langchain without using online services?

    • @jackleung0748
      @jackleung0748 Год назад

      You can use the code on vs code with python

  • @fgfanta
    @fgfanta Год назад

    The "Code in this video" linked in the video description cannot be accessed.

  • @aleksandarmilivojevic8641
    @aleksandarmilivojevic8641 Год назад

    Thanks for the video, I tried with your online app, but for some reason I get IndexError('list index out of range'). Why is that? Am I doing something wrong? I uploaded pdf 3 pages, put my openai key and put 1 for chunks and clicked "run".
    _

  • @anujcb
    @anujcb Год назад

    is it possible modify this to take the file(s) for a google drive folder?

    • @SophiaYangDS
      @SophiaYangDS  Год назад

      Yes LangChain has a Google Drive loader python.langchain.com/en/latest/modules/indexes/document_loaders/examples/googledrive.html

  • @KemalCanKara
    @KemalCanKara Год назад

    hi, thank you for this great video. Can ask u smth? What can be max length of a pdf in pages for this kind of job if we use lets say gpt-4?

    • @kamiartik6479
      @kamiartik6479 Год назад

      There is no limit because the similarity calculation is happening outside of GPT. Once the relevant information is found in the document, it is fed into GPT. So as long as the chunk size is less than the GPT-4 limit (It is better not to use big chunks as well), you can use documents with any length.

  • @shubhambaghel219
    @shubhambaghel219 Год назад

    It is also giving some response out of context(pdf). have anyone faced the same thing or observed this thing.if yes how can we stop this, i have tried prompts with chain but that didn't work.

    • @SophiaYangDS
      @SophiaYangDS  Год назад

      In the code, I asked it to output the answer and the relevant chunks of text. Is that what you see? You can remove the relevant chunks of text (return_source_documents=False and others).

    • @shubhambaghel219
      @shubhambaghel219 Год назад

      @@SophiaYangDS No, I think you misunderstood my question. I am seeing that model is giving answer to questions which are not in pdf documents. ea. who is elon musk. model is giving this answer using GPT trained knowledge instead of using pdfs only. I want to restrict this thing.

  • @surajkhan5834
    @surajkhan5834 Год назад

    Can we do this in nodejs

  • @peeturpain9379
    @peeturpain9379 Год назад

    Great video. Since I dont have a paid subscription of openAIs api, can you please detail how I can use other models, from hugging face to be able to replicate this? Especially the llama 2 model. Thanks.

  • @mertozlutiras
    @mertozlutiras Год назад +1

    I think in the video it's not obvious but embedding all chunks when you first load the document should take some time

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w Год назад

    can we try with another LLM besides OpenAI, given the cost?

  • @christiancarpinelli
    @christiancarpinelli Год назад

    hi Sophia, excellent video as always and big fan of your content, I’m sure you’ll grow a lot in the YT tech space.
    I would like to have your thoughts on this one, I'm building a chatbot that helps my users to get informations about a functionality and execute some actions via API... I was thinking to have the GPT-3.5-turbo ChatAPI as "orchestrator", and if the user wants to get informations redirect the request to a query on a vector DB for getting useful chunks of info and feed those to GPT-4 and get an appropriate response to the user question, and if instead the user wants to execute an action, redirect the request to GPT-4 and the LangChain OpenAPI Agent to execute it and return the result to the user.
    What do you think about this approach? Any suggestions?

    • @SophiaYangDS
      @SophiaYangDS  Год назад

      Thanks so much for the support! Appreciate it! I'm actually not sure if I'm following your idea. Are you in the LangChain Discord? Might be a good place to get feedback on your ideas : )

    • @christiancarpinelli
      @christiancarpinelli Год назад

      @@SophiaYangDS yes I am on the Discord! but I didn’t get much feedback and wanted to hear your thoughts on it.
      Let me explain better. Basically I want to create a ChatBot that uses the ChatGPT API, this chatbot needs to be able to support normal conversations, but with also the capability to respond using internal documents (this part is pretty clear, you made excellent tutorials on that) but this chatbot also allows the user to interact with some APIs of the platform… now, this part also is pretty clear, but my issue is on integrate in a single Chat experience this two use cases. Hope that I made it clearer, and thank you for the response!

    • @SophiaYangDS
      @SophiaYangDS  Год назад

      ​@@christiancarpinelli sounds like you want combine the pdf retriever chain with another API chain? If I understand you correctly, I think you could do a sequential chain or write your own logic to combine these two chains together.

  • @emmanuelkolawole6720
    @emmanuelkolawole6720 Год назад

    Can you set this up with the vicuna AI model? That is the true test. Because not everyone wants to send their data to openAi

    • @SophiaYangDS
      @SophiaYangDS  Год назад

      You can use llama.ccp with LangChain

  • @davidzhang4825
    @davidzhang4825 Год назад

    Does this work with Data in Excel or Google Sheet ?

    • @SophiaYangDS
      @SophiaYangDS  Год назад +1

      Yes LangChain has a CSV document loader and a GCS document loader. You can try those

  • @dribbens91
    @dribbens91 Год назад

    how do i know whats my API ?!

    • @SophiaYangDS
      @SophiaYangDS  Год назад

      You can get your API key from the OpenAI website

  • @mohsenghafari7652
    @mohsenghafari7652 9 месяцев назад

    hi. please help me. how to create custom model from many pdfs in Persian language? tank you.

  • @rorororo-z8l
    @rorororo-z8l Год назад

    the app it doesn't work , i don't know why ?

    • @SophiaYangDS
      @SophiaYangDS  Год назад

      I just tried again. It works for me. Did you set up billing at OpenAI? The OpenAI API only works when the billing is set up. It's also possible when many people tried the app at the same time, it just crashed

  • @MR_GREEN1337
    @MR_GREEN1337 Год назад +1

    but again, ope ai api key isn't free, so maybe deploying this publicly would cost your a lot of money

  • @JCSantiago
    @JCSantiago Год назад

    Anyone else following along? I tried to run the code but got the error when I try to run panel serve LangChain_QA_Panel_App.ipynb
    LangChain_QA_Panel_App.ipynb", line 7, in
    "metadata": {},
    ^^^^^^^^^^^
    NameError: name 'get_ipython' is not defined

    • @vinitshah8309
      @vinitshah8309 Год назад

      I got the same error, have you found the solution ?

  • @AI-Consultant
    @AI-Consultant Год назад

    whats your thoughts on dolly 2.0

    • @code4AI
      @code4AI Год назад +1

      smile. that is is finally open source with a CC BY-SA 3.0 license for commercial integration. And closed-source Llama is history.